Hessian Free Deep Learning

نویسنده

  • Subodh Iyengar
چکیده

Optimization techniques used in Machine Learning play an important role in the training of the Neural Network in regression and classification tasks. Predominantly, first order optimization methods such as Gradient Descent have been used in the training of Neural Networks, since second order methods, such as Newton’s method, are computationally infeasible. However, second order methods show much better convergence characteristics than first order methods, because they also take into account the curvature of the error space. Additionally, first order methods require a lot of tuning of the decrease parameter, which is application specific. They also have a tendency to get trapped in local optimum and exhibit slow convergence. Thus Newton’s method is absolutely essential to train networks with deep architectures. The reason for in-feasibility of Newton’s method is the computation of the Hessian matrix, which takes prohibitively long. Influential work by Pearlmutter [2] led to development of a method of using the Hessian without actually computing it. Recent work [1] has involved training of a deep network consisting of a number of Restricted Boltzmann Machine using Newton’s method without directly computing the Hessian matrix, in a form of “Hessian free” learning. The method had exhibited success on the MNIST handwriting recognition data set when used to train an Restricted Boltzmann Machine using Hinton’s [3] method, with a better quality solution for classification tasks. The proposed work for the CS229 project aims to improve upon the method of “Hessian Free” (HF) learning and apply it to different classification tasks. To do this, the Hessian free learning method will be implemented and results for the experiments using MNIST will be replicated. Through analysis, it is aimed to propose further modifications that will improve the method and also run it on different classification tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep learning via Hessian-free optimization

We develop a 2nd-order optimization method based on the “Hessian-free” approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn’t limited in applicability to autoencoders, or...

متن کامل

Improved Preconditioner for Hessian Free Optimization

We investigate the use of Hessian Free optimization for learning deep autoencoders. One of the critical components in that algorithm is the choice of the preconditioner. We argue in this paper that the Jacobi preconditioner leads to faster optimization and we show how it can be accurately and efficiently estimated using a randomized algorithm.

متن کامل

Investigations on hessian-free optimization for cross-entropy training of deep neural networks

Context-dependent deep neural network HMMs have been shown to achieve recognition accuracy superior to Gaussian mixture models in a number of recent works. Typically, neural networks are optimized with stochastic gradient descent. On large datasets, stochastic gradient descent improves quickly during the beginning of the optimization. But since it does not make use of second order information, ...

متن کامل

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of ...

متن کامل

Block-diagonal Hessian-free Optimization

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010